"OpenStreetMap Analytics: Rewarding contributors by tracking OpenStreetMap in real-time" by Marc Farra Live captioning by Norma Miller. @whitecoatcapxg The goal here is to try to have a more conversational feel to the presentation. >> Or not, I guess. I mean ... >> Just gauging the room around here. Who went to the keynote? So who knows about missing maps? OK, great I'll just say a small sentence about missing maps. Basically it's a team of people that are adding vulnerable people on the map, they target areas where the map doesn't exist. So I'm Marc. I'm at Development Seed. This is talk is about rewarding contributors in real time. We worked with the Red Cross to redesign the missing maps website. Missingmaps.org, so if you go there there's a new website if you haven't seen it. and the major contributions that we've made is as user profiles. So can everyone see that in the back? Great. >> >> It's pixelated. I thought that was done on purpose. >> Are all of them pixelated? And these are pixelated, too. Oh, gees, OK, well, you can listen to my voice and follow. Actually, I have a PDF. And maybe I'll just switch to the PDF. >> Oh,! Is it a loose connection? >> >> Yeah,. [applause] >> OK. Thank you, kind stranger. OK, so down Dan here has now a user profile. These are now things that are on the missing maps website. Basically if you've edited in the past few months and use a missing maps hashtag in your chain set, you now have a user profile, you have badges to aspire to, and you can -- you have a user profile, you have badges and you'll see it. It's really nice. >> Another thing that we worked on is leaderboards. So this is for you can now -- if you use a hashtag, you you can have a leader board for that mapathon. My talk is going to be focusing on hashtags and metadata basically the guts of OSM of how we did this. Reward mechanisms like why are we do this and the open source that hopefully you can contribute to. So a changeset in OSM is basically data and metadata. When you commit -- when you commit in iD you basically create two files. You create data file, which has all the geoinformation, and you have a metadata file which is also stored, which contains, you know, different types of information. >> And a lot of people at State of the Map talked about metadata. Toby, for example, showed us an analysis of metadata. And I'll show you what metadata looks like. First of all this is a changeset, you've got a comment here about auto OSM, mapping building for residents, and then you have that geographic information. You also have things like use the iD editor and the locale. This is what's stored in OSM, and you know, for a programmer, this is very exciting to me. For others it's just a bunch of gibberish. And it has all that data that I showed in the previous slide. It has that tag which has the comment, the locale, the imagery used created by, and this is stuff that you can grab from OSM and you can start analyzing it. And we do that because when you analyze geographic information in OSM there are a ton of tools to do that, but the metadata is more ha statement about the community and the community growth. So analyzing the metadata, using various tools can give you an idea about how OSM is moving forward. Now, let's talk about hashtags. Hashtags kind of split the community in their use, and there's some talk about that in the mailing list. Don't look at the mailing list. Hashtags are really useful. [laughter] So why are they useful? Hashtags are spatially unbounded. As in, if you use a hashtag, that's to create a grouping of changesets that is not constrained within a spatial community. Hashtags, for example, track events. So if you're on the tasking manager and you use like #HOT OSM Nepal earthquake, you group all the changesets that are related to the Nepal earthquake, and then you can analyze that, and for example, create a tool like Jennings did for the Nepal earthquake of changesets per hour, to analyze a community response. We see large spikes around this around HOT applications. This is a little tool that Development Seed built ways tracking trending OSM hashtags in the last 24 hours, now for example you can see you've got large hashtags by missing maps, #peace Corp.s, these are recurring hashtags, as well as activations or tasks in the tasking manager. Hashtags then track groups. What I mean by groups is large data teams. So it will you're, for example, JP Morgan Chase and you want to track your users, you know, you can use hashtags like JPMC summer hire. I'm not sure what that's tracking, but for the partner page, for JPMC on the missing maps website, that's really useful for them to track their contributors over time. We can see that now they have 601 contributors, and they map these buildings and roads and they do that through a combination of the metadata and the feature data, which I'll talk about. This is another tool by pascal nice who we've used as an inspiration for a lot of the tools we've built. You can consider missing maps as a data team. It's one of the largest data teams right now, and by analysis of hashtags, you know, you could see there are at least 10,000 contributors. These are the bounding boxes of, you know, some of the missing maps hashtags over time. So we talked about hashtags on metadata. I want to talk about feedback loops. Now that we have -- now that we analyzed this community, how do we go back and make it a bit better? So how do we make better mappers? We can do this around analysis of the metadata, but rewarding contributors is a cool aspect that you know, we can strive for. Once we reward contributors, we want to encourage better mapping, encourage validation. And that's a sort of why we built the new missing maps website in that way. Mappers are -- you know, some of them are one off off mappers, some of them are you know, long-time mappers, but we want to stabilize that framework a bit. So what motivates contributors? We've got three types that we look into regularly, and you know, their motivations over time. You've got idealistic mapping, you've got -- so idealistic mapping is mapping your own community. And striving for OSM as like the best open source community in the world. You've got reactive mapping and this is reacting to events like the Nepal earthquake, and you've got institutional mapping, which we've also touched in this conference, where you've got these large data teams mapping within their institution. And you know, something Dale told me a few hours ago, is we want to strive towards a complete mapper, so I put this slide up. A complete mapper is a mapper that wants to map locally in their neighborhood, but also participate in the greater social ideal of OSM. And we, through our research, we looked into what motivates a mapper. We looked at taxonomy of what motivates users on different platforms and that research showed that there are four basic types of motivation. Well, at least that we know about. Immersion. Immersion means feeling part of the mapping effort. We do that by creating a user profile where we have, oh, you belong to these projects. You've mapped all these dates, so we give them a calendar and to show you know, how long they've mapped. We show them their latest badge. We show them, you know, quick statistics. Achievement is another motivator, and this is more for completionist types. They want to get all the badges and when I say all the badges we want to look at that dichotomy of local and social mapping. We want to reward people that on the ground add you know, GPX traces that validate on the tasking manager and there are badges for that, but also, the mapathoner badge for participating in mapathons. And also create badge progress so that you know what badges they're going to get. Cooperation, and this goes without saying for missing maps. Cooperation is essential in, you know, creating mapathons, and spaces for people to map together, and this appeals to social interactions and user types. And competition. This is for, you know, other types of users, which -- and we're not talking about destructive competition. We're talking about, you know, the healthy type of competition that comes from pitting a school versus another school to map out more of the map. Here I'm showing three institutions: Missing maps which has a million edits. MapGive, and Peace Corps use the platform in different ways, but hopefully that number is going to grow. So as a recap, we have these four types of motivation that we try to target with the new missing maps website with these user profiles and leaderboards. Immersion, achievement, cooperation, and competition to get better retention for these mappers. But we want to also close the feedback loop, so we said, the feedback loop wants to strive at getting mappers to map once and then come back to other mapathons, so maybe for example, they get these badges and then -- or they map the task and then we send them a thank you, or we send an email reminding them to come to the platform. I'm sure that there are ways that we can do this in a noninvasive and nonannoying way like other apps do. We're open to feed back on how to close the feedback loop. We don't do it just yet but I'd like to revisit the data in 6 months and see, has the new missing maps website and you know, this strategy of using hashtags affected community in some way. Now, I'm going to switch gears a bit and go more into programming and developer mode. Who here considers themselves a developer? And not? OK, so for those people that somehow missed something, please stop me and I'll explain it. This system stands on the shoulders of giants here. Overpass and the planet files. Who knows overpass? I think a lot of people in the room know overpass. Overpass creates these augmented DIFs, it tracks OSM every minute and every minute creates these very, very large commits which have the geographic information before and after the commit. Now, this is not found on other tools in OSM, and we're very grateful to the overpass developers for developing the tool. That creates the augmented diffs. Planet is a repository of metadata files. We talked about the metadata files and why they're important. They contain, you know, comment history and hashtags, and what we do is we take the metadata files, we combine them with the augmented diffs to create this rich diff, or some other people call them real changesets, because they contain the comment that we use or the metadata that we use for filtering the hashtags, but also contain the feature data needed for creating building counts or kilometers of roads, and that's what we use for the system. >> So a little system diagram. It's in ASCII for fun. So there we have planet as a data source, and overpass as a data source, and they're going to -- they go into a piece of software that Development Seed built called planet stream and spits out the real diff or sorry, rich diff, this rich diff goes into a cache which stores map data for the leaderboards. Trending and also go into a cache queue that calculates aggregate statistics. Now, of course, and then we combine all of these and serve an API which is the missing maps API and it's open and anyone can use it for their leaderboards. If you don't like how the leaderboards look, you can make your own leaderboard with the missing maps data. Of course, of course, of course, every single piece of that is open source and we need contributors to keep it stable. And tomorrow during the hack day we're going to talk about these components, we are going to talk about the rich diffs and aggregate statistics, so I invite you to come around, attend for the hack day and we will talk about it. We will talk technical, rich diffs, but we also want to tackle the use cases for the statistics on a higher level. So how do -- what do we need from these statistics? What is useful in a mapathon? So please come tomorrow and let's discuss it. So here are the repos, we've got OSM-stats repo and we're not tied to the name, because everything is called OSM-stats. So if you find one, let us know. The statistics, the API, which is the missing maps API and the website. What's next? We just turned on the system for all hashtags, so now it's looking at all hashtags instead of just missing maps. But we're still monitoring it, so don't create a ton of mapathons in the next week. More stable diffs, so a few of these components are not as resilient to hiccups in the OSM database. For example, if overpass is a -- overpass is great at creating the augmented diffs but if there's a problem in the original database and overpass goes down, then the system can lag for a few hours until overpass comes up. One thing that we want to talk about tomorrow is how to make overpass more resilient. Real quick, I just want to finish. We want to integrate this back into OSM, into the infrastructure, and we want to talk about closing the loop. So real quick, we talked about hashtags and metadata, reward mechanisms, open source Analytics, and and I'm inviting everyone to contribute, contribute to mapathons, contribute to missing maps, that's my handle,@kamicut, and I'd love to have discussions about this. Do we have time for questions? If you don't get your question, you can always find me. AUDIENCE MEMBER: Hi, Marc, what you do you think about opening the system not just to missing mess maps but to the entire OpenStreetMap. Yes, so we want to look at that and I touched a pit about it that we want to get all hashtags and now, hashtags are used in changeset comments. Ideally, these hashtags would be an extra tag in the metadata that you could add with iD, but this is a good way to, you know, to track these events in groups and data teams. AUDIENCE MEMBER: Yeah, I wasn't just thinking about hashtags, but I want to earn badges without participating in missing maps. I want to earn badges but without participating in -- >> Right, and like I said, you can have your own data team, you can have your own hashtag and you can run your own API and everything's open source, you can read the documentation for that, one your own API and have that feed into your mapathon. Like I said, we're looking at possible ways that this happens inside OSM, so that this is like your user profile, rather than just a missing maps profile, and then that user profile has badges and statistics and it would be great if we could have that integration. And, you know, if -- if you know a way to get to there, to get to that, I'd love to talk about it. >> >> We have time for one more, I think. Anyone else? >> >> Yeah? >> AUDIENCE MEMBER: Do you have any early anecdotal evidence that the badges are increasing user contribution or increasing user retention? >> Right, so it's a bit early for that. We've only deployed the system for the past couple of months, well, without bugs, right? We want to revisit this in six months. We know that the people that are already contributing to missing maps want to contribute because they feel, you know, the some sort of satisfaction that they're contributing but we have a lot of user profiles, like the one that I showed with people that map every day. And I want to see if that person, you know, that mapped every day, you know, look at their history before we deployed and see if they also mapped every day, so that could be a good way to do it. Is Piacco Decay in the room? No? Well, good job. >> We need to leave it there, because we need to attend closing session, but thank you. >> Great. Thank you. [applause]